KAFKA-13149: fix NPE for record==null when handling a produce request #11080

ccding · 2021-07-19T17:23:42Z

This code

kafka/clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java

Lines 294 to 296 in bfc57aa

    
           int sizeOfBodyInBytes = ByteUtils.readVarint(buffer); 
        
           if (buffer.remaining() < sizeOfBodyInBytes) 
        
               return null;

returns record=null, and can subsequently cause a null pointer exception in

kafka/core/src/main/scala/kafka/log/LogValidator.scala

Line 191 in bfc57aa

if (!record.hasMagic(batch.magic)) {

This PR lets the broker throw an invalid record exception and notify clients. The fix is similar to

kafka/clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java

Lines 340 to 358 in bfc57aa

    
               int numHeaders = ByteUtils.readVarint(buffer); 
        
               if (numHeaders < 0) 
        
                   throw new InvalidRecordException("Found invalid number of record headers " + numHeaders); 
        
               final Header[] headers; 
        
               if (numHeaders == 0) 
        
                   headers = Record.EMPTY_HEADERS; 
        
               else 
        
                   headers = readHeaders(buffer, numHeaders); 
        
               // validate whether we have read all header bytes in the current record 
        
               if (buffer.position() - recordStart != sizeOfBodyInBytes) 
        
                   throw new InvalidRecordException("Invalid record size: expected to read " + sizeOfBodyInBytes + 
        
                           " bytes in record payload, but instead read " + (buffer.position() - recordStart)); 
        
               return new DefaultRecord(sizeInBytes, attributes, offset, timestamp, sequence, key, value, headers); 
        
           } catch (BufferUnderflowException | IllegalArgumentException e) { 
        
               throw new InvalidRecordException("Found invalid record structure", e); 
        
           }

where we throw an invalid record exception when the record's integrity is broken.

This code https://github.com/apache/kafka/blob/bfc57aa4ddcd719fc4a646c2ac09d4979c076455/clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java#L294-L296 returns record=null, and can subsequently cause a null pointer exception in https://github.com/apache/kafka/blob/bfc57aa4ddcd719fc4a646c2ac09d4979c076455/core/src/main/scala/kafka/log/LogValidator.scala#L191 This PR lets the broker throw an invalid record exception and notify clients. The fix is similar to https://github.com/apache/kafka/blob/bfc57aa4ddcd719fc4a646c2ac09d4979c076455/clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java#L340-L358 where we throw an invalid record exception when the record's integrity is broken.

ccding · 2021-07-19T19:26:41Z

This PR is ready for review

ijuma · 2021-07-19T20:31:53Z

clients/src/main/java/org/apache/kafka/common/record/DefaultRecord.java

-            return null;
+            throw new InvalidRecordException("Invalid record size: expected " + sizeOfBodyInBytes +
+                " bytes in record payload, but instead the buffer has only " + buffer.remaining() +
+                " remaining bytes.");


Is this really an exceptional case? Don't we do reads where we don't know exactly where the read ends and hence will trigger this path?

Are you saying the case that we are yet to complete reading the request? I didn't see a retry path, but it will cause a null point exception at

kafka/core/src/main/scala/kafka/log/LogValidator.scala

Line 191 in bfc57aa

if (!record.hasMagic(batch.magic)) {

What do you suggest I do here?

I think the intent here was to cover the case where an incomplete record is returned by the broker. However, we have broker logic to try and avoid this case since KIP-74:

} else if (!hardMaxBytesLimit && readInfo.fetchedData.firstEntryIncomplete) { // For FetchRequest version 3, we replace incomplete message sets with an empty one as consumers can make // progress in such cases and don't need to report a `RecordTooLargeException` FetchDataInfo(readInfo.fetchedData.fetchOffsetMetadata, MemoryRecords.EMPTY)

@hachikuji Do you remember if there is still a reason to return null here instead of the exception @ccding is proposing?

the case where an incomplete record is returned by the broker

I am referring to the produce API for the null pointer exception. The record is from a producer. The InvalidRecordException will trigger a response to the producer.

If the fetch path requires a different return value, I guess the problem becomes more complicated.

Yes, I understand you're talking about the producer case. I am talking about the fetch case. As I said, I think we may not need that special logic anymore, but @hachikuji would know for sure.

@hachikuji do you have time to have a look at this?

Apologies for the delay here. I don't see a problem with the change. I believe that @ijuma is right that the fetch response may still return incomplete data, but I think this is handled in ByteBufferLogInputStream. We stop batch iteration early if there is incomplete data, so we would never reach the readFrom here which is called for each record in the batch. It's worth noting also that the only caller of this method (in DefaultRecordBatch.uncompressedIterator) has the following logic:

try { return DefaultRecord.readFrom(buffer, baseOffset, firstTimestamp, baseSequence, logAppendTime); } catch (BufferUnderflowException e) { throw new InvalidRecordException("Incorrect declared batch size, premature EOF reached"); }

So it is already handle underflows in a similar way.

Thanks for checking @hachikuji.

hachikuji

LGTM

hachikuji · 2021-09-13T23:24:46Z

@ccding I kicked off a new build since it has been a while since the PR was submitted. Assuming tests are ok, I will merge shortly. Thanks for your patience.

ccding · 2021-09-14T15:41:08Z

The tests Jason kicked in failed two tests:

Build / JDK 8 and Scala 2.12 / testDescribeTopicsWithIds() – kafka.api.PlaintextAdminIntegrationTest
Build / JDK 11 and Scala 2.13 / shouldQueryStoresAfterAddingAndRemovingStreamThread – org.apache.kafka.streams.integration.StoreQueryIntegrationTest

both worked on my local run with merging trunk to this branch.

Pushing the trunk merge to this branch and let Jenkins to run it again.

…equests (#11080) Raise `InvalidRecordException` from `DefaultRecordBatch.readFrom` instead of returning null if there are not enough bytes remaining to read the record. This ensures that the broker can raise a useful exception for malformed record batches. Reviewers: Ismael Juma <[email protected]>, Jason Gustafson <[email protected]>

…equests (apache#11080) Raise `InvalidRecordException` from `DefaultRecordBatch.readFrom` instead of returning null if there are not enough bytes remaining to read the record. This ensures that the broker can raise a useful exception for malformed record batches. Reviewers: Ismael Juma <[email protected]>, Jason Gustafson <[email protected]>

ccding added 2 commits July 19, 2021 10:18

add test

b357b81

ccding changed the title ~~[WIP] fix NPE when record==null in append~~ fix NPE when record==null in append Jul 19, 2021

ijuma reviewed Jul 19, 2021

View reviewed changes

ccding changed the title ~~fix NPE when record==null in append~~ KAFKA-13149: fix NPE for record==null when handling a produce request Jul 29, 2021

hachikuji approved these changes Sep 13, 2021

View reviewed changes

Merge branch 'trunk' into ak-record-null

117c550

hachikuji merged commit 75795d1 into apache:trunk Sep 14, 2021

sakibguy mentioned this pull request Sep 15, 2021

[ORG] RU-AXAV: KAFKA-13149; Fix NPE when handling malformed record data in produce r… sakibguy/kafka#19

Merged

3 tasks

ccding deleted the ak-record-null branch September 16, 2021 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-13149: fix NPE for record==null when handling a produce request #11080

KAFKA-13149: fix NPE for record==null when handling a produce request #11080

ccding commented Jul 19, 2021

ccding commented Jul 19, 2021

ijuma Jul 19, 2021

ccding Jul 20, 2021 •

edited

Loading

ijuma Jul 20, 2021

ccding Jul 20, 2021 •

edited

Loading

ijuma Jul 20, 2021

ccding Jul 26, 2021

hachikuji Sep 13, 2021 •

edited

Loading

ijuma Sep 13, 2021

hachikuji left a comment

hachikuji commented Sep 13, 2021

ccding commented Sep 14, 2021

	int sizeOfBodyInBytes = ByteUtils.readVarint(buffer);
	if (buffer.remaining() < sizeOfBodyInBytes)
	return null;

	int numHeaders = ByteUtils.readVarint(buffer);
	if (numHeaders < 0)
	throw new InvalidRecordException("Found invalid number of record headers " + numHeaders);

	final Header[] headers;
	if (numHeaders == 0)
	headers = Record.EMPTY_HEADERS;
	else
	headers = readHeaders(buffer, numHeaders);

	// validate whether we have read all header bytes in the current record
	if (buffer.position() - recordStart != sizeOfBodyInBytes)
	throw new InvalidRecordException("Invalid record size: expected to read " + sizeOfBodyInBytes +
	" bytes in record payload, but instead read " + (buffer.position() - recordStart));

	return new DefaultRecord(sizeInBytes, attributes, offset, timestamp, sequence, key, value, headers);
	} catch (BufferUnderflowException \| IllegalArgumentException e) {
	throw new InvalidRecordException("Found invalid record structure", e);
	}

KAFKA-13149: fix NPE for record==null when handling a produce request #11080

KAFKA-13149: fix NPE for record==null when handling a produce request #11080

Conversation

ccding commented Jul 19, 2021

ccding commented Jul 19, 2021

ijuma Jul 19, 2021

Choose a reason for hiding this comment

ccding Jul 20, 2021 • edited Loading

Choose a reason for hiding this comment

ijuma Jul 20, 2021

Choose a reason for hiding this comment

ccding Jul 20, 2021 • edited Loading

Choose a reason for hiding this comment

ijuma Jul 20, 2021

Choose a reason for hiding this comment

ccding Jul 26, 2021

Choose a reason for hiding this comment

hachikuji Sep 13, 2021 • edited Loading

Choose a reason for hiding this comment

ijuma Sep 13, 2021

Choose a reason for hiding this comment

hachikuji left a comment

Choose a reason for hiding this comment

hachikuji commented Sep 13, 2021

ccding commented Sep 14, 2021

ccding Jul 20, 2021 •

edited

Loading

ccding Jul 20, 2021 •

edited

Loading

hachikuji Sep 13, 2021 •

edited

Loading